Goto

Collaborating Authors

 monocular camera


3D Mapping Using a Lightweight and Low-Power Monocular Camera Embedded inside a Gripper of Limbed Climbing Robots

Okawara, Taku, Nishibe, Ryo, Kasano, Mao, Uno, Kentaro, Yoshida, Kazuya

arXiv.org Artificial Intelligence

Limbed climbing robots are designed to explore challenging vertical walls, such as the skylights of the Moon and Mars. In such robots, the primary role of a hand-eye camera is to accurately estimate 3D positions of graspable points (i.e., convex terrain surfaces) thanks to its close-up views. While conventional climbing robots often employ RGB-D cameras as hand-eye cameras to facilitate straightforward 3D terrain mapping and graspable point detection, RGB-D cameras are large and consume considerable power. This work presents a 3D terrain mapping system designed for space exploration using limbed climbing robots equipped with a monocular hand-eye camera. Compared to RGB-D cameras, monocular cameras are more lightweight, compact structures, and have lower power consumption. Although monocular SLAM can be used to construct 3D maps, it suffers from scale ambiguity. To address this limitation, we propose a SLAM method that fuses monocular visual constraints with limb forward kinematics. The proposed method jointly estimates time-series gripper poses and the global metric scale of the 3D map based on factor graph optimization. We validate the proposed framework through both physics-based simulations and real-world experiments. The results demonstrate that our framework constructs a metrically scaled 3D terrain map in real-time and enables autonomous grasping of convex terrain surfaces using a monocular hand-eye camera, without relying on RGB-D cameras. Our method contributes to scalable and energy-efficient perception for future space missions involving limbed climbing robots. See the video summary here: https://youtu.be/fMBrrVNKJfc


Aucamp: An Underwater Camera-Based Multi-Robot Platform with Low-Cost, Distributed, and Robust Localization

Xu, Jisheng, Lin, Ding, Fong, Pangkit, Fang, Chongrong, Duan, Xiaoming, He, Jianping

arXiv.org Artificial Intelligence

This paper introduces an underwater multi-robot platform, named Aucamp, characterized by cost-effective monocular-camera-based sensing, distributed protocol and robust orientation control for localization. We utilize the clarity feature to measure the distance, present the monocular imaging model, and estimate the position of the target object. We achieve global positioning in our platform by designing a distributed update protocol. The distributed algorithm enables the perception process to simultaneously cover a broader range, and greatly improves the accuracy and robustness of the positioning. Moreover, the explicit dynamics model of the robot in our platform is obtained, based on which, we propose a robust orientation control framework. The control system ensures that the platform maintains a balanced posture for each robot, thereby ensuring the stability of the localization system. The platform can swiftly recover from an forced unstable state to a stable horizontal posture. Additionally, we conduct extensive experiments and application scenarios to evaluate the performance of our platform. The proposed new platform may provide support for extensive marine exploration by underwater sensor networks.


Design of a Formation Control System to Assist Human Operators in Flying a Swarm of Robotic Blimps

Wu, Tianfu, Fu, Jiaqi, Meng, Wugang, Cho, Sungjin, Zhan, Huanzhe, Zhang, Fumin

arXiv.org Artificial Intelligence

Formation control is essential for swarm robotics, enabling coordinated behavior in complex environments. In this paper, we introduce a novel formation control system for an indoor blimp swarm using a specialized leader-follower approach enhanced with a dynamic leader-switching mechanism. This strategy allows any blimp to take on the leader role, distributing maneuvering demands across the swarm and enhancing overall formation stability. Only the leader blimp is manually controlled by a human operator, while follower blimps use onboard monocular cameras and a laser altimeter for relative position and altitude estimation. A leader-switching scheme is proposed to assist the human operator to maintain stability of the swarm, especially when a sharp turn is performed. Experimental results confirm that the leader-switching mechanism effectively maintains stable formations and adapts to dynamic indoor environments while assisting human operator.


Reinforcement Learning-Based Monocular Vision Approach for Autonomous UAV Landing

Houichime, Tarik, Amrani, Younes EL

arXiv.org Artificial Intelligence

This paper introduces an innovative approach for the autonomous landing of Unmanned Aerial Vehicles (UAVs) using only a front-facing monocular camera, therefore obviating the requirement for depth estimation cameras. Drawing on the inherent human estimating process, the proposed method reframes the landing task as an optimization problem. The UAV employs variations in the visual characteristics of a specially designed lenticular circle on the landing pad, where the perceived color and form provide critical information for estimating both altitude and depth. Reinforcement learning algorithms are utilized to approximate the functions governing these estimations, enabling the UAV to ascertain ideal landing settings via training. This method's efficacy is assessed by simulations and experiments, showcasing its potential for robust and accurate autonomous landing without dependence on complex sensor setups. This research contributes to the advancement of cost-effective and efficient UAV landing solutions, paving the way for wider applicability across various fields.


Unified Human Localization and Trajectory Prediction with Monocular Vision

Luan, Po-Chien, Gao, Yang, Demonsant, Celine, Alahi, Alexandre

arXiv.org Artificial Intelligence

Conventional human trajectory prediction models rely on clean curated data, requiring specialized equipment or manual labeling, which is often impractical for robotic applications. The existing predictors tend to overfit to clean observation affecting their robustness when used with noisy inputs. In this work, we propose MonoTransmotion (MT), a Transformer-based framework that uses only a monocular camera to jointly solve localization and prediction tasks. Our framework has two main modules: Bird's Eye View (BEV) localization and trajectory prediction. The BEV localization module estimates the position of a person using 2D human poses, enhanced by a novel directional loss for smoother sequential localizations. The trajectory prediction module predicts future motion from these estimates. We show that by jointly training both tasks with our unified framework, our method is more robust in real-world scenarios made of noisy inputs. We validate our MT network on both curated and non-curated datasets. On the curated dataset, MT achieves around 12% improvement over baseline models on BEV localization and trajectory prediction. On real-world non-curated dataset, experimental results indicate that MT maintains similar performance levels, highlighting its robustness and generalization capability. The code is available at https://github.com/vita-epfl/MonoTransmotion.


Event-Based Adaptive Koopman Framework for Optic Flow-Guided Landing on Moving Platforms

Banday, Bazeela, Sah, Chandan Kumar, Keshavan, Jishnu

arXiv.org Artificial Intelligence

This paper presents an optic flow-guided approach for achieving soft landings by resource-constrained unmanned aerial vehicles (UAVs) on dynamic platforms. An offline data-driven linear model based on Koopman operator theory is developed to describe the underlying (nonlinear) dynamics of optic flow output obtained from a single monocular camera that maps to vehicle acceleration as the control input. Moreover, a novel adaptation scheme within the Koopman framework is introduced online to handle uncertainties such as unknown platform motion and ground effect, which exert a significant influence during the terminal stage of the descent process. Further, to minimize computational overhead, an event-based adaptation trigger is incorporated into an event-driven Model Predictive Control (MPC) strategy to regulate optic flow and track a desired reference. A detailed convergence analysis ensures global convergence of the tracking error to a uniform ultimate bound while ensuring Zeno-free behavior. Simulation results demonstrate the algorithm's robustness and effectiveness in landing on dynamic platforms under ground effect and sensor noise, which compares favorably to non-adaptive event-triggered and time-triggered adaptive schemes.


Pose, Velocity and Landmark Position Estimation Using IMU and Bearing Measurements

Wang, Miaomiao, Tayebi, Abdelhamid

arXiv.org Artificial Intelligence

This paper investigates the estimation problem of the pose (orientation and position) and linear velocity of a rigid body, as well as the landmark positions, using an inertial measurement unit (IMU) and a monocular camera. First, we propose a globally exponentially stable (GES) linear time-varying (LTV) observer for the estimation of body-frame landmark positions and velocity, using IMU and monocular bearing measurements. Thereafter, using the gyro measurements, some landmarks known in the inertial frame and the estimates from the LTV observer, we propose a nonlinear pose observer on $\SO(3)\times \mathbb{R}^3$. The overall estimation system is shown to be almost globally asymptotically stable (AGAS) using the notion of almost global input-to-state stability (ISS). Interestingly, we show that with the knowledge (in the inertial frame) of a small number of landmarks, we can recover (under some conditions) the unknown positions (in the inertial frame) of a large number of landmarks. Numerical simulation results are presented to illustrate the performance of the proposed estimation scheme.


MCGMapper: Light-Weight Incremental Structure from Motion and Visual Localization With Planar Markers and Camera Groups

Xie, Yusen, Huang, Zhenmin, Chen, Kai, Zhu, Lei, Ma, Jun

arXiv.org Artificial Intelligence

Structure from Motion (SfM) and visual localization in indoor texture-less scenes and industrial scenarios present prevalent yet challenging research topics. Existing SfM methods designed for natural scenes typically yield low accuracy or map-building failures due to insufficient robust feature extraction in such settings. Visual markers, with their artificially designed features, can effectively address these issues. Nonetheless, existing marker-assisted SfM methods encounter problems like slow running speed and difficulties in convergence; and also, they are governed by the strong assumption of unique marker size. In this paper, we propose a novel SfM framework that utilizes planar markers and multiple cameras with known extrinsics to capture the surrounding environment and reconstruct the marker map. In our algorithm, the initial poses of markers and cameras are calculated with Perspective-n-Points (PnP) in the front-end, while bundle adjustment methods customized for markers and camera groups are designed in the back-end to optimize the 6-DOF pose directly. Our algorithm facilitates the reconstruction of large scenes with different marker sizes, and its accuracy and speed of map building are shown to surpass existing methods. Our approach is suitable for a wide range of scenarios, including laboratories, basements, warehouses, and other industrial settings. Furthermore, we incorporate representative scenarios into simulations and also supply our datasets with pose labels to address the scarcity of quantitative ground-truth datasets in this research field. The datasets and source code are available on GitHub.


DisBeaNet: A Deep Neural Network to augment Unmanned Surface Vessels for maritime situational awareness

Vemula, Srikanth, Franco, Eulises, Frye, Michael

arXiv.org Artificial Intelligence

Intelligent detection and tracking of the vessels on the sea play a significant role in conducting traffic avoidance in unmanned surface vessels(USV). Current traffic avoidance software relies mainly on Automated Identification System (AIS) and radar to track other vessels to avoid collisions and acts as a typical perception system to detect targets. However, in a contested environment, emitting radar energy also presents the vulnerability to detection by adversaries. Deactivating these Radiofrequency transmitting sources will increase the threat of detection and degrade the USV's ability to monitor shipping traffic in the vicinity. Therefore, an intelligent visual perception system based on an onboard camera with passive sensing capabilities that aims to assist USV in addressing this problem is presented in this paper. This paper will present a novel low-cost vision perception system for detecting and tracking vessels in the maritime environment. This novel low-cost vision perception system is introduced using the deep learning framework. A neural network, DisBeaNet, can detect vessels, track, and estimate the vessel's distance and bearing from the monocular camera. The outputs obtained from this neural network are used to determine the latitude and longitude of the identified vessel.


Opti-Acoustic Semantic SLAM with Unknown Objects in Underwater Environments

Singh, Kurran, Hong, Jungseok, Rypkema, Nicholas R., Leonard, John J.

arXiv.org Artificial Intelligence

Despite recent advances in semantic Simultaneous Localization and Mapping (SLAM) for terrestrial and aerial applications, underwater semantic SLAM remains an open and largely unaddressed research problem due to the unique sensing modalities and the object classes found underwater. This paper presents an object-based semantic SLAM method for underwater environments that can identify, localize, classify, and map a wide variety of marine objects without a priori knowledge of the object classes present in the scene. The method performs unsupervised object segmentation and object-level feature aggregation, and then uses opti-acoustic sensor fusion for object localization. Probabilistic data association is used to determine observation to landmark correspondences. Given such correspondences, the method then jointly optimizes landmark and vehicle position estimates. Indoor and outdoor underwater datasets with a wide variety of objects and challenging acoustic and lighting conditions are collected for evaluation and made publicly available. Quantitative and qualitative results show the proposed method achieves reduced trajectory error compared to baseline methods, and is able to obtain comparable map accuracy to a baseline closed-set method that requires hand-labeled data of all objects in the scene.